Lexical Semantic Analysis in Natural Language Text
نویسنده
چکیده
Computer programs that make inferences about natural language are easily fooled by the often haphazard relationship between words and their meanings. This thesis develops Lexical Semantic Analysis (LxSA), a general-purpose framework for describing word groupings and meanings in context. LxSA marries comprehensive linguistic annotation of corpora with engineering of statistical natural language processing tools. The framework does not require any lexical resource or syntactic parser, so it will be relatively simple to adapt to new languages and domains. The contributions of this thesis are: a formal representation of lexical segments and coarse semantic classes; a well-tested linguistic annotation scheme with detailed guidelines for identifying multiword expressions and categorizing nouns, verbs, and prepositions; an English web corpus annotated with this scheme; and an open source NLP system that automates the analysis by statistical sequence tagging. Finally, we motivate the applicability of lexical semantic information to sentence-level language technologies (such as semantic parsing and machine translation) and to corpus-based linguistic inquiry.
منابع مشابه
Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations
The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...
متن کاملUNL Based Bangla Natural Text Conversion - Predicate Preserving Parser Approach
Universal Networking Language (UNL) is a declarative formal language that is used to represent semantic data extracted from natural language texts. This paper presents a novel approach to converting Bangla natural language text into UNL using a method known as Predicate Preserving Parser (PPP) technique. PPP performs morphological, syntactic and semantic, and lexical analysis of text synchronou...
متن کاملComputing Lexical Cohesion as a Tool for Text Analysis
Recognizing coherent structure of a text is an essential task in natural language understanding. It is necessary, for example, to resolve anaphora, ellipsis, and ambiguity. One of the dominant factors of coherence of the text structure is lexical cohesion, namely the dependency relationship between words based on associative relations in common knowledge. This thesis proposes an objective and c...
متن کاملPreferred Lexical Access Route in Persian Learners of English: Associative, Semantic or Both
Background: Words in the Mental Lexicon (ML) construct semantic field through associative and/ or semantic connections, with a pervasive native speaker preference for the former. Non-native preferences, however, demand further inquiry. Previous studies have revealed inconsistent Lexical Access (LA) patterns due to the limitations in the methodology and response categorization. Objectives: To f...
متن کاملGenerating Lexical Analogies Using Dependency Relations
A lexical analogy is a pair of word-pairs that share a similar semantic relation. Lexical analogies occur frequently in text and are useful in various natural language processing tasks. In this study, we present a system that generates lexical analogies automatically from text data. Our system discovers semantically related pairs of words by using dependency relations, and applies novel machine...
متن کامل